Multi-Layer Perceptron, MNIST


In this notebook, we will train an MLP to classify images from the MNIST database hand-written digit database.

The process will be broken down into the following steps:

  1. Load and visualize the data
  2. Define a neural network
  3. Train the model
  4. Evaluate the performance of our trained model on a test dataset!

Before we begin, we have to import the necessary libraries for working with data and PyTorch.


In [1]:
# import libraries
import torch
import numpy as np

Load and Visualize the Data

Downloading may take a few moments, and you should see your progress as the data is loading. You may also choose to change the batch_size if you want to load more data at a time.

This cell will create DataLoaders for each of our datasets.


In [2]:
from torchvision import datasets
import torchvision.transforms as transforms

# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# choose the training and test datasets
train_data = datasets.MNIST(root='data',
                            train=True,
                            download=True,
                            transform=transform)
test_data = datasets.MNIST(root='data',
                           train=False,
                           download=True,
                           transform=transform)

# prepare data loaders
train_loader = torch.utils.data.DataLoader(dataset=train_data,
                                           batch_size=batch_size,
                                           num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(dataset=test_data,
                                          batch_size=batch_size, 
                                          num_workers=num_workers)

Visualize a Batch of Training Data

The first step in a classification task is to take a look at the data, make sure it is loaded in correctly, then make any initial observations about patterns in that data.


In [3]:
import matplotlib.pyplot as plt
%matplotlib inline
    
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    # print out the correct label for each image
    # .item() gets the value contained in a Tensor
    ax.set_title(str(labels[idx].item()))


View an Image in More Detail


In [4]:
img = np.squeeze(images[1])

fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
ax.imshow(img, cmap='gray')
width, height = img.shape
thresh = img.max()/2.5
for x in range(width):
    for y in range(height):
        val = round(img[x][y],2) if img[x][y] !=0 else 0
        ax.annotate(str(val), xy=(y,x),
                    horizontalalignment='center',
                    verticalalignment='center',
                    color='white' if img[x][y]<thresh else 'black')



Define the Network Architecture

The architecture will be responsible for seeing as input a 784-dim Tensor of pixel values for each image, and producing a Tensor of length 10 (our number of classes) that indicates the class scores for an input image. This particular example uses two hidden layers and dropout to avoid overfitting.


In [5]:
import torch.nn as nn
import torch.nn.functional as F

## TODO: Define the NN architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Linear layer (784 -> 128 hidden nodes)
        self.fc1 = nn.Linear(in_features=(28 * 28),
                             out_features=128)
        self.fc2 = nn.Linear(in_features=128,
                             out_features=10)

    def forward(self, x):
        # flatten image input
        x = x.view(-1, 28 * 28)
        # add hidden layer, with relu activation function
        x = F.relu(self.fc1(x))
        x = F.softmax(self.fc2(x), dim=1)
        return x

# initialize the NN
model = Net()
print(model)


Net(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
)

Specify Loss Function and Optimizer

It's recommended that you use cross-entropy loss for classification. If you look at the documentation (linked above), you can see that PyTorch's cross entropy function applies a softmax funtion to the output layer and then calculates the log loss.


In [6]:
## TODO: Specify loss and optimization functions

# specify loss function
criterion = nn.CrossEntropyLoss()

# specify optimizer
optimizer = torch.optim.Adam(params=model.parameters(),
                             lr=0.001)

In [7]:
optimizer


Out[7]:
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.001
    weight_decay: 0
)

Train the Network

The steps for training/learning from a batch of data are described in the comments below:

  1. Clear the gradients of all optimized variables
  2. Forward pass: compute predicted outputs by passing inputs to the model
  3. Calculate the loss
  4. Backward pass: compute gradient of the loss with respect to model parameters
  5. Perform a single optimization step (parameter update)
  6. Update average training loss

The following loop trains for 30 epochs; feel free to change this number. For now, we suggest somewhere between 20-50 epochs. As you train, take a look at how the values for the training loss decrease over time. We want it to decrease while also avoiding overfitting the training data.


In [8]:
# Number of epochs to train the model
n_epochs = 50  # suggest training between 20-50 epochs

# Prep model for training
model.train() 

for epoch in range(n_epochs):
    # Monitor training loss
    train_loss = 0.0
    
    ###################
    # train the model #
    ###################
    for data, target in train_loader:
        # Clear the gradients of all optimized variables
        optimizer.zero_grad()
        # Forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # Calculate the loss
        loss = criterion(output, target)
        # Backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # Perform a single optimization step (parameter update)
        optimizer.step()
        # Update running training loss
        train_loss += loss.item() * data.size(0)
        
    # Print training statistics 
    # Calculate average loss over an epoch
    train_loss = train_loss/len(train_loader.dataset)

    print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch+1, 
                                                     train_loss))


Epoch: 1 	Training Loss: 1.573228
Epoch: 2 	Training Loss: 1.517614
Epoch: 3 	Training Loss: 1.503269
Epoch: 4 	Training Loss: 1.495551
Epoch: 5 	Training Loss: 1.489638
Epoch: 6 	Training Loss: 1.485734
Epoch: 7 	Training Loss: 1.483105
Epoch: 8 	Training Loss: 1.480955
Epoch: 9 	Training Loss: 1.478819
Epoch: 10 	Training Loss: 1.477431
Epoch: 11 	Training Loss: 1.476430
Epoch: 12 	Training Loss: 1.475208
Epoch: 13 	Training Loss: 1.475078
Epoch: 14 	Training Loss: 1.473664
Epoch: 15 	Training Loss: 1.473421
Epoch: 16 	Training Loss: 1.472502
Epoch: 17 	Training Loss: 1.472015
Epoch: 18 	Training Loss: 1.471783
Epoch: 19 	Training Loss: 1.471400
Epoch: 20 	Training Loss: 1.470385
Epoch: 21 	Training Loss: 1.470628
Epoch: 22 	Training Loss: 1.470376
Epoch: 23 	Training Loss: 1.470209
Epoch: 24 	Training Loss: 1.469892
Epoch: 25 	Training Loss: 1.469523
Epoch: 26 	Training Loss: 1.469656
Epoch: 27 	Training Loss: 1.469097
Epoch: 28 	Training Loss: 1.469037
Epoch: 29 	Training Loss: 1.468982
Epoch: 30 	Training Loss: 1.468898
Epoch: 31 	Training Loss: 1.468536
Epoch: 32 	Training Loss: 1.468630
Epoch: 33 	Training Loss: 1.468143
Epoch: 34 	Training Loss: 1.468544
Epoch: 35 	Training Loss: 1.467989
Epoch: 36 	Training Loss: 1.468079
Epoch: 37 	Training Loss: 1.467830
Epoch: 38 	Training Loss: 1.467544
Epoch: 39 	Training Loss: 1.467501
Epoch: 40 	Training Loss: 1.467237
Epoch: 41 	Training Loss: 1.467651
Epoch: 42 	Training Loss: 1.467390
Epoch: 43 	Training Loss: 1.467518
Epoch: 44 	Training Loss: 1.467693
Epoch: 45 	Training Loss: 1.467182
Epoch: 46 	Training Loss: 1.466719
Epoch: 47 	Training Loss: 1.467148
Epoch: 48 	Training Loss: 1.466855
Epoch: 49 	Training Loss: 1.466981
Epoch: 50 	Training Loss: 1.467050

Test the Trained Network

Finally, we test our best model on previously unseen test data and evaluate it's performance. Testing on unseen data is a good way to check that our model generalizes well. It may also be useful to be granular in this analysis and take a look at how this model performs on each class as well as looking at its overall loss and accuracy.

model.eval()

model.eval() will set all the layers in your model to evaluation mode. This affects layers like dropout layers that turn "off" nodes during training with some probability, but should allow every node to be "on" for evaluation!


In [9]:
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

# Prep model for *evaluation*
model.eval() 

for data, target in test_loader:
    # forward pass: compute predicted outputs by passing inputs to the model
    output = model(data)
    # calculate the loss
    loss = criterion(output, target)
    # update test loss 
    test_loss += loss.item()*data.size(0)
    # convert output probabilities to predicted class
    _, pred = torch.max(output, 1)
    # compare predictions to true label
    correct = np.squeeze(pred.eq(target.data.view_as(pred)))
    # calculate test accuracy for each object class
    for i in range(batch_size):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1

# calculate and print avg test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
            str(i), 100 * class_correct[i] / class_total[i],
            np.sum(class_correct[i]), np.sum(class_total[i])))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct), np.sum(class_total)))


Test Loss: 1.482544

Test Accuracy of     0: 99% (973/980)
Test Accuracy of     1: 98% (1123/1135)
Test Accuracy of     2: 97% (1011/1032)
Test Accuracy of     3: 97% (988/1010)
Test Accuracy of     4: 97% (960/982)
Test Accuracy of     5: 97% (874/892)
Test Accuracy of     6: 98% (942/958)
Test Accuracy of     7: 97% (1004/1028)
Test Accuracy of     8: 96% (936/974)
Test Accuracy of     9: 96% (973/1009)

Test Accuracy (Overall): 97% (9784/10000)

Visualize Sample Test Results

This cell displays test images and their labels in this format: predicted (ground-truth). The text will be green for accurately classified examples and red for incorrect predictions.


In [10]:
# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()

# get sample outputs
output = model(images)
# convert output probabilities to predicted class
_, preds = torch.max(output, 1)
# prep images for display
images = images.numpy()

# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title("{} ({})".format(str(preds[idx].item()), str(labels[idx].item())),
                 color=("green" if preds[idx]==labels[idx] else "red"))